A warp contains multiple thread processors (typically 32/64). All processors in a warp run the same code simultaneously.
Each core has some memory allocated for both L1 cache and shared memory. Each core contains 4 processing blocks (which can run a warp each).
A dispatched workgroup may run on multiple warps.
All cores share an L2 cache.
https://www.youtube.com/watch?v=whPSD8sdx-0